-
Notifications
You must be signed in to change notification settings - Fork 493
Make FreeRTOS a first-class citizen, enable multithread, multicore TCP/IP #3063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
0e85745
to
3aabdc9
Compare
With the expanded performance and memory of the Pico 2, having a full operating system with threads could really improve developer life and get the best possible algorithm performance. Start by moving FreeRTOS checks from a global bool set by weak function linkage checks to a compiler definition. This can allow the build process to differ between bare metal and FreeRTOS builds (i.e. async_context implementations) and save a few bytes of program space.
3aabdc9
to
3109f1d
Compare
Implement the LWIP task (the only task allowed to call actual LWIP calls) using a work queue accessible from any other task/core. Move FreeRTOS into the main cores/rp2040 directory to allow for easier core usage. Dynamically build the proper async_context for raw or FreeRTOS in the IDE, not at libpico time.
The CYW43 driver can come up and start processing data. Unfortunately when it needs to send data out through LWIP we have a deadlock. There is an CYW43 async_context semaphore owned by the calling task. In this case, the task is the periodic callback in "asyn_con" (i.e. the background timer). 1. When the timeout hits, the async context task is woken up and the first thing it does is take the async_context semaphore. 2. During background processing (sys_check_timeouts) an LWIP call is made. 3. The LWIP call sends a message to the LWIP task and wakes it up. The ASYN_CON task is now suspended waiting for the LWIP task done notification. It holds the ASYN_CON semaphore while asleep. 4. LWIP does a bunch of stuff and tries to do an ethernet_output to send bits over the wire (i.e. accept DHCP or something). 5. Eventually LWIP's netif call stack goes to the CYW43 object (while still in the LWIP task) who tries to acquire the CYW43 semaphore and fails (because it's already held by the async task that's sleeping). 6. There is no 6, it's deadlocked at this point.
Deadlocks/pauses seem to be unavoidable when running Ethernet packet receive outside of the LWIP thread. Add a callback queue message to the LWIP thread to allow for it to run whatever ethernet processing needed. These messages still get stuffed by another simple periodic task.
IRQs will send a work queue callback instad of actually doing anything. Disable all IRQs when the work noted, and then reset it when done.
The IRQ callback routine ended up passing in a stack variable address. For main app code this is legal because the app blocks until the LWIP call returns. But for IRQs, it returns immediately (can't block in IRQ) and the stack pointer we passed in will be corrupt. Use a dumb static (heap) variable for now. W5100 is now running with multiple AdvancedWebServer clients in parallel with a WiFiClient MOTD process on core 1, in parallel. Rewriting the CYW43 driver to work with this TBD.
Trying not to end up on thedailywtf.com
It is possible under heavy load with multithreading that a connection gets a tcp_abort just before tcp_accept will be called. When we abort, we clear the this pointer, and so when we try and recover our object we crash. Check for aborted connections (and ignore) on WiFiServer, like we do in WiFiClient
The LWIP processor thread should initialize LWIP since it's the only person actually calling it directly. LWIP thread can also handle the sys_check_timeout calls for all drivers, no need for Ethernet loop to worry about that.
12d1f53
to
aa5cc30
Compare
aa5cc30
to
6cdf3f3
Compare
We track a list of all connections to allow the WiFi.stopAll() call (used for updates). This linked list needs to be protected from multiple cores updating it in parallel under FreeRTOS. Add a mutex around the ops.
Simple CYW43 WiFiClient test runs
When in FreeRTOS, we don't use an async_context to drive our Ethernet NICs. Ifdef around any unused code/variables.
9593c62
to
504de82
Compare
Blinking the LED doesn't work on all boards (CYW43), but panic() will show on any debugger quite nicely.
We write pointers to the data, make sure it doesn't bus fault
No need to add obscure __align__ if we don't pretend it's an array of bytes
Don't use any async_context for the __FREERTOS case because the extra context mutex causes deadlocks. Rewrite and simplify the CYW43 driver from the SDK to use FreeRTOS native primitives, no TaskHandle, and hook into the LWIP thread.
c06c291
to
5912d41
Compare
The non-CYW43 drivers look pretty good here. We have complete control over their architecture making it much cleaner than the CYW43 driver included in the SDK. I've had it handling ~30 HTTP server requests/second(!) and 10 Tasks(threads) of MOTD clients over a slow W5100 wired interface for >12hrs w/o a hiccup. The CYW32 driver in this mode is slow to connect initially but seems relatively stable. It will probably need to be rewritten from scratch, using the CYW32 library and dropping any of the SDK driver stuff, because If you're doing wired Ethernet and FreeRTOS and LWIP, I'd recommend giving this a try. If you're using Platform.IO, add |
With the expanded performance and memory of the Pico 2, having a full operating system with threads could really improve developer life and get the best possible algorithm performance.
Start by moving FreeRTOS checks from a global bool set by weak function linkage checks to a compiler definition. This can allow the build process to differ between bare metal and FreeRTOS builds (i.e. async_context implementations) and save a few bytes of program space.